excel批量重命名_Excel保留了混乱的基因名称,因此科学家将其重命名

excel批量重命名

I love making spreadsheets. I like lining up little columns of numbers and writing formulas to do things to them. It’s halfway between coding and note-taking. I have sheets for accounts (obviously) but also for projects, holidays, and hobbies. There’s one for the contents of my loft. My New Year’s resolutions? They’re in a spreadsheet. Often, when I start thinking about something, I automatically open a sheet and structure my thoughts into rows and columns. If all you have is a spreadsheet, everything looks like a cell (to misquote Abraham Maslow.)

我喜欢制作电子表格。 我喜欢排列数字的小列并编写公式以对其执行操作。 它介于编码和笔记之间。 我有帐单(很明显),也有项目表,假期表和兴趣爱好表。 我的鸽舍里有一个。 我新年的决议? 他们在电子表格中。 通常,当我开始思考某件事时,我会自动打开一张纸并将思想整理成行和列。 如果您只有电子表格,那么所有内容看起来都像一个单元格( 错误引用了Abraham Maslow)

Use Excel for any length of time and you become familiar with its foibles. Type in a phone number and, if you’re unlucky, it’ll turn it into something like 8.E+09. Best case scenario you’ll lose the first 0. Sometimes numbers get turned into dates. Sometimes dates get turned into numbers. I’ve got used to seeing #N/A.

使用Excel 在任何时间长度内,您都会熟悉其脆弱性。 输入电话号码,如果您不走运,它将变成8.E + 09之类的东西。 最好的情况是,您将丢失前0。有时数字会变成日期。 有时日期变成数字。 我已经习惯了看到#N / A。

These things are annoying, but you get used to them. However, if you’re a geneticist, problems like these plague your industry. Typing most genes into Excel isn’t a problem. “Myosin regulatory light chain interacting protein” is fine (shortened to MYLIP), but type in “Membrane-associated ring-CH-type fingers” (shortened to MARCH1) and Excel recognizes it as a date and “helpfully” converts it to March 1, 2020.

这些事情很烦人,但是您已经习惯了。 但是,如果您是遗传学家,那么类似的问题将困扰您的行业。 将大多数基因输入Excel并不是问题。 “肌球蛋白调节性轻链相互作用蛋白”很好(缩写为MYLIP),但键入“膜相关的无环CH型手指” (缩写为MARCH1),Excel会将其识别为日期,并“有帮助地”将其转换为2020年3月1日。

This tickles me. It’s the sort of weird edge case I find amusing. When the first Excel software engineer wrote the feature to scan text and convert certain values to dates, who would have thought that one day that would mess up scientific research documents? I also have a sense of relief that I’m not the only one who has to battle Excel. But this gene formatting, more than an amusing quirk, is actually a surprisingly big issue. “A programmatic scan of leading genomics journals reveals that approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions,” scientists wrote in a study four years ago. Indeed, they have been writing about the issues Excel causes them since 2004. This delightfully quirky oddity has been messing up genomics journals for two decades.

这让我很痒。 这是我觉得很有趣的一种奇怪的情况。 当第一位Excel软件工程师编写该功能来扫描文本并将某些值转换为日期时,谁会想到这一天会弄乱科学研究文档? 我也感到很轻松,我不是唯一一个必须与Excel对抗的人。 但是,这种基因格式化不仅是一个有趣的怪癖,而且实际上是一个令人惊讶的大问题。 科学家在四年前的一项研究中写道: “对领先的基因组学期刊进行的程序扫描显示,大约有五分之一的带有补充Excel基因清单的论文都包含错误的基因名称转换,” 实际上,自2004年以来,他们就一直在撰写有关 Excel导致它们的问题的信息 。这种令人奇怪的怪异现象困扰了基因组学期刊已有20年了。

That was until a few weeks ago when the HUGO Gene Nomenclature Committee (HGNC) decided to rename the problematic genes so that they didn’t get converted into dates in Excel. MARCH1 becomes MARCHF1, SEPT1 becomes SEPTIN1, and so on. Put another way: Geneticists got so annoyed Excel messed up their data they changed the official scientific names to make them more Excel-friendly.

直到几周前,HUGO基因命名委员会(HGNC)决定重命名有问题的基因,以便它们不会在Excel中转​​换为日期。 MARCH1变成MARCHF1,SEPT1变成SEPTIN1,依此类推。 换种说法:遗传学家对Excel的数据感到非常恼火,于是他们改变了官方的科学名称,使它们对Excel更友好。

I also have a sense of relief that I’m not the only one who has to battle Excel.

我也感到很轻松,我不是唯一一个必须与Excel对抗的人。

There’s something Kafkaesque about this. The sublime comes crashingly into contact with the banal: Important scientific work, meet Excel formatting. It’s strange to see our individual experiences mirrored on a global scale. You wouldn’t think genetics, as an industry, would have the same problems that I have, as an individual.

这有一些卡夫卡式风格。 崇高的事物与平庸的事物崩溃了:重要的科学工作,符合Excel格式。 看到我们的个人经验在全球范围内得到反映是很奇怪的。 您不会认为遗传学作为一个产业会像我个人一样遇到同样的问题。

Online, after the initial lols and hahas, I’ve spotted three distinct responses to this. Firstly, the “learn to use Excel properly” response. That is: There’s nothing wrong with Excel, the scientists just aren’t using the tool correctly. If they want their data to be kept as it is without formatting, they should add an apostrophe before the value, or they should set the column type to text. It’s their own fault their data is getting messed up, and the whole fiasco is an indictment of the scientific world’s computer proficiency.

在网上,最初的大笑和哈哈之后,我发现了对此的三种不同回应。 首先,“学会正确使用Excel”的回答。 那就是:Excel没什么问题,科学家只是没有正确使用该工具。 如果他们希望不格式化就保持数据原样,则应在该值之前添加撇号,或者将列类型设置为text。 这是他们自己的错,他们的数据混乱了,整个惨败是对科学界计算机熟练程度的一种指责。

Secondly, there’s the “scientists shouldn’t be using Excel anyway” excuse: Excel is too basic a tool for scientists to be using. They should be using Matlab, or R, or some other advanced scripting language or application to handle their data, and then they wouldn’t be having this problem.

其次,还有“科学家们无论如何都不应该使用Excel”的借口:Excel太基本了,科学家无法使用它。 他们应该使用Matlab或R,或其他高级脚本语言或应用程序来处理其数据,这样他们就不会遇到这个问题。

And finally, there are the Microsoft haters: This is all Microsoft’s fault for corrupting data. Not only should Excel stop doing this to these specific 27 genes that match dates, but it should also stop doing anything to any data at all, all of the time. Excel is a scourge on humanity and we should join forces with the scientists to mount an attack on Microsoft and get them to change their ways. In this response, the whole fiasco is an indictment of the poor state of software usability.

最后,还有Microsoft的仇恨:这是Microsoft破坏数据的全部过错。 Excel不仅应停止对匹配日期的这27个特定基因进行此操作,而且还应始终停止对任何数据进行任何处理。 Excel是对人类的祸害,我们应该与科学家一起对Microsoft发起攻击,并让他们改变自己的方式。 在此回应中,整个惨败是对软件可用性状况不佳的一种指责。

The sublime comes crashingly into contact with the banal: Important scientific work, meet Excel formatting.

崇高的事物与平庸的事物崩溃了:重要的科学工作,符合Excel格式。

I have sympathy for all of these views, but the truth surely lies somewhere in the middle. When the Human Genome Organization (HUGO) made this change it was because everyone is trapped between a rock and a hard place. Between scientists and lab assistants with wildly different computer skills. Between geneticists and software backward compatibility.

我对所有这些观点都表示同情,但事实确实存在于中间。 当人类基因组组织(HUGO)做出这一改变时,是因为每个人都被困在一块石头和一块硬土地之间。 科学家和实验室助手之间具有非常不同的计算机技能。 遗传学家和软件之间具有向后兼容性。

Many scientists will, of course, know about data formats and how to stop their data from being converted to dates. But still, accidents creep in. Tables get saved in CSV format, loaded into Excel again, and corrupted. Junior researchers forget. Something will happen that will cause a problem to creep back in. “It’s really, really annoying,” one geneticist told The Verge. It’s the data formatting that broke the researchers back.

当然,许多科学家将了解数据格式以及如何阻止其数据转换为日期。 但是,事故仍在蔓延。表以CSV格式保存,再次加载到Excel中,并且损坏了。 初级研究人员忘记了。 某些事情会发生,导致问题重新出现。“这真的,真的很烦,”一位遗传学家告诉The Verge 正是数据格式使研究人员大为退缩。

For Microsoft, this is a strange edge case. These 27 genes just coincidentally match strings that could be read as dates. And to be fair to Microsoft: The names of the months came first. (Indeed, when Excel was first written these genes hadn’t been named.) Perhaps there is a world where this issue gained publicity and Microsoft released a new version of Excel with the date-parsing code changed to explicitly avoid converting these to dates. But that’s fiddly and complicated and even if Microsoft made that update, it would take years to have any impact as universities around the world gradually renewed their Microsoft software enterprise agreements and updated to the latest version of Excel. More likely, if Microsoft had even been alerted to this issue, they would have just sent a link to the relevant KB article.

对于微软来说,这是一个奇怪的情况。 这27个基因恰好匹配可以读作日期的字符串。 为了公平起见,微软:几个月的名字排在第一位。 (实际上,当初写Excel时,这些基因还没有被命名。)也许这个世界引起了人们的关注,微软发布了新版本的Excel,更改了日期解析代码,以明确避免将它们转换为日期。 但是,这很奇怪而且很复杂,即使Microsoft进行了更新,随着世界各地的大学逐渐更新其Microsoft软件企业协议并将其更新到最新版本的Excel,要花费数年的时间才能产生影响。 更可能的是,如果Microsoft甚至已就此问题收到警告,他们将只发送指向相关KB文章的链接。

And so geneticists had to choose which side of Bernard Shaw’s famous witticism they fell on: whether they were the reasonable ones who adapted to the world or the unreasonable ones who persisted in trying to adapt the world. They adapted themselves.

因此,遗传学家必须选择他们落在伯纳德·肖著名的辩证法的哪一方面:是适应世界的理性主义者还是坚持尝试适应世界的不合理主义者。 他们适应了自己。

It’s a game of Where’s Waldo for incorrectly formatted genes.

这是“哪里的沃尔多”游戏的错误格式基因。

There are interesting political points here about the relative powers of these two entities. Maybe a point about a sort of widespread, low-level incompetence on the part of humans when it comes to computers. Or about Excel itself. Nearly a decade ago, Joel Spolsky, a former Excel program manager at Microsoft, pointed out that “most Excel users never enter a formula. They use Excel when they need a table. The gridlines are the most important feature of Excel, not recalc.”

关于这两个实体的相对权力,这里有一些有趣的政治观点。 也许是关于计算机方面人类普遍存在的,低水平的无能的观点。 或关于Excel本身。 大约十年前,微软公司前Excel程序经理Joel Spolsky 指出: “大多数Excel用户从未输入公式。 他们在需要表格时使用Excel。 网格线是Excel的最重要功能,而不是重新计算。”

The criticism is focused on Microsoft because Excel has become the genericized trademark for spreadsheets. But the same problem occurs in Google Sheets, and so even if Microsoft did change Excel, the problem wouldn’t go away. For completeness’s sake, I also tried importing genes into Numbers, Apple’s spreadsheet software, and found it doesn’t reformat MARCH1 into a date. While this would be great for geneticists, I can’t help wondering if this lack of automatic format detection is one of the reasons Numbers isn’t more popular.

批评集中在Microsoft上,因为Excel已成为电子表格的通用商标。 但是,在Google表格中也会发生相同的问题,因此即使Microsoft确实更改了Excel,该问题也不会消失。 为了完整起见,我还尝试将基因导入Apple的电子表格软件Numbers中,发现它不会将MARCH1重新格式化为日期。 尽管这对于遗传学家来说非常有用,但我不禁要问这种缺乏自动格式检测的原因是否是Numbers不再受欢迎的原因之一。

I’ve become fascinated by this whole debacle, as it seems to stand for something much larger: our own powerlessness, and even the powerlessness of whole industries, in the face of technology.

我对整个崩溃感到着迷,因为它似乎代表着更大的东西:面对技术,我们自己的无能为力,甚至整个行业的无能为力。

I find myself thinking about how software — limited and difficult to use, often unsuitable to the task, fragile — has spread around the world, pervading and invading every facet of every industry. You can’t get away from software. A computer on every desk, and in every house, yes, but also in every pocket, in every shop and office. A computer behind every action and thought. And we can no more change software to work for our industry than we can alter the shifting of the tides. Off-the-shelf apps are a force of nature. Research scientists work around software limitations in the same way sailors work around tidal charts.

我发现自己在思考软件(有限且难于使用,通常不适合任务,脆弱)在世界范围内的传播,入侵和入侵各个行业的各个方面。 您无法摆脱软件。 是的,每张桌子和每所房屋中都有一台计算机,但是在每个商店和办公室中的每一个口袋中也都有。 行动和思想背后的电脑。 而且,我们不能改变软件来为我们的行业服务,而只能改变潮流。 现成的应用程序是自然的力量。 研究科学家们围绕软件局限性进行研究,就像水手们围绕潮汐图进行工作一样。

I’ve downloaded spreadsheets of gene data, meaningless to me, just to play with and spot the errors. It’s a game of Where’s Waldo for incorrectly formatted genes. As I find myself philosophizing about this, I remember that the industry has to keep going. The whole thing is ludicrous of course, but the HGNC has made a sensible, pragmatic decision, which pleased geneticists, to combat what was, essentially, an unfortunate, if amusing, naming clash.

我已经下载了基因数据电子表格,对我来说毫无意义,只是为了发现错误。 这是“哪里的沃尔多”游戏的错误格式基因。 当我发现自己对此颇有哲理时,我记得该行业必须继续前进。 整个过程固然可笑,但是HGNC做出了明智而务实的决定,这令遗传学家高兴地与本质上是不幸的,可笑的命名冲突作斗争。

There’s a funny sort of epilogue to all of this. Scrolling through lists of genes, I’ve caught sight of some other names. And, sometimes, they are just weird. One gene is named “Sonic Hedgehog,” named partly after the video game character and the band Sonic Youth. Another is called “Bag of Marbles.” And there’s also Cheap Date, Buttonhead, and Dunce. There are lots of names like these. This all seems like a bit of a laugh until you’re the doctor trying to sensitively tell a parent their child has a serious health worry and you have to explain with solemnity they have a mutation in their One-Eyed Pinhead.

所有这些都有一个有趣的结尾。 在基因列表中滚动浏览时,我看到了其他一些名称。 而且,有时它们只是很奇怪。 一个基因被命名为“ Sonic Hedgehog ”,部分以电子游戏角色和Sonic Youth乐队命名。 另一个被称为“ 大理石袋” 。 还有“便宜约会”,“钮扣头”和“邓斯”。 有许多这样的名字直到您是医生试图敏感地告诉父母他们的孩子患有严重的健康问题,并且您必须郑重地解释他们的“单眼针头”突变后,这一切似乎让人有些笑。

Before reading about these, my ignorance of genes led me to believe the names were carefully crafted by scientists, and it was something of an outrage that they had to be renamed for as trivial a reason as Excel. Now, it’s hard to believe that scientists are the adults in the room. Online, I’ve seen outrage about the Excel change, but I realize that has all been on behalf of geneticists, rather than from geneticists themselves, who, on the whole, seem relieved. Perhaps there’s another more human point here about our assumptions. I think one of the reasons this Excel story catches imaginations is because of the sanctity of science and our assumptions of scientists as logical, research-driven individuals. Rather than people who, like all of us, have a joke-around and, given half a chance, give genes silly names. And, like all of us, are just trying to do the best they can with the software they have.

在阅读这些内容之前,我对基因的无知使我相信这些名称是由科学家精心制作的,这是令人愤慨的,因为与Excel一样琐碎的原因,它们不得不被重命名。 现在,很难相信科学家是房间里的成年人。 在网上,我已经看到了有关Excel更改的愤怒,但是我意识到这全是代表遗传学家而不是遗传学家本身的,总的来说,遗传学家似乎松了一口气。 关于我们的假设,也许还有另一个更人性化的观点。 我认为这个Excel故事引起人们的想象的原因之一是因为科学的神圣性以及我们对科学家是逻辑上,研究驱动型个人的假设。 而不是像我们所有人一样开玩笑,并且有一半机会给基因起傻名字的人。 而且,像我们所有人一样,他们只是在尝试使用自己拥有的软件来尽力而为。

翻译自: https://onezero.medium.com/excel-kept-messing-up-the-names-of-genes-so-scientists-renamed-them-4bd33859abfb

excel批量重命名

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值